Goto

Collaborating Authors

 vulnerability discovery


Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback

Lin, Shiyin

arXiv.org Artificial Intelligence

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.


LLM-based Vulnerability Discovery through the Lens of Code Metrics

Weissberg, Felix, Pirch, Lukas, Imgrund, Erik, Möller, Jonas, Eisenhofer, Thorsten, Rieck, Konrad

arXiv.org Artificial Intelligence

Large language models (LLMs) excel in many tasks of software engineering, yet progress in leveraging them for vulnerability discovery has stalled in recent years. To understand this phenomenon, we investigate LLMs through the lens of classic code metrics. Surprisingly, we find that a classifier trained solely on these metrics performs on par with state-of-the-art LLMs for vulnerability discovery. A root-cause analysis reveals a strong correlation and a causal effect between LLMs and code metrics: When the value of a metric is changed, LLM predictions tend to shift by a corresponding magnitude. This dependency suggests that LLMs operate at a similarly shallow level as code metrics, limiting their ability to grasp complex patterns and fully realize their potential in vulnerability discovery. Based on these findings, we derive recommendations on how research should more effectively address this challenge.


AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models

Qiu, Le, Xu, Zelai, Tan, Qixin, Tang, Wenhao, Yu, Chao, Wang, Yu

arXiv.org Artificial Intelligence

Assessing the safety of autonomous driving policy is of great importance, and reinforcement learning (RL) has emerged as a powerful method for discovering critical vulnerabilities in driving policies. However, existing RL-based approaches often struggle to identify vulnerabilities that are both effective-meaning the autonomous vehicle is genuinely responsible for the accidents-and diverse-meaning they span various failure types. To address these challenges, we propose AED, a framework that uses large language models (LLMs) to automatically discover effective and diverse vulnerabilities in autonomous driving policies. We first utilize an LLM to automatically design reward functions for RL training. Then we let the LLM consider a diverse set of accident types and train adversarial policies for different accident types in parallel. Finally, we use preference-based learning to filter ineffective accidents and enhance the effectiveness of each vulnerability. Experiments across multiple simulated traffic scenarios and tested policies show that AED uncovers a broader range of vulnerabilities and achieves higher attack success rates compared with expert-designed rewards, thereby reducing the need for manual reward engineering and improving the diversity and effectiveness of vulnerability discovery.


Pitfalls in Machine Learning for Computer Security

Communications of the ACM

We identify ten pitfalls as don'ts of machine learning in security and propose dos as actionable recommendations to support researchers in avoiding the pitfalls where possible. Furthermore, we identify open problems that cannot be mitigated easily and require further research effort (§2).


Will AI Make Cyber Swords or Shields: A few mathematical models of technological progress

Lohn, Andrew J, Jackson, Krystal Alex

arXiv.org Artificial Intelligence

Predicting the impact of advances in technology may be a fool's errand but it is a necessary one nonetheless to help try to guide research and funding toward efforts that benefit defense more than offense. For this paper, we try to mathematically model the impact of further advancement in several critical aspects of cybersecurity. Perhaps more importantly than any of the forewarnings or funding recommendations we come to, this approach strives to sharpen debates about AI's impact on cybersecurity. This is the companion paper for a separate report, published by CSET and titled, "Will AI Make Cyber Swords or Shields," illustrating the value of rigor in policy discussions about technological advancement. There is too much uncertainty to believe that the math gives precise projections, but it forces us to be precise in our assumptions. Reasonable people may disagree with the range of values we choose as inputs or even the models we use. We welcome those disagreements and hope they advance our collective understanding of how AI may change the future of cybersecurity. Following this introduction, we proceed with separate analysis from three areas of cybersecurity: 1) phishing, 2) vulnerability discovery, then 3) the dynamics between patching and exploitation.


Towards Learning Representations of Binary Executable Files for Security Tasks

Arakelyan, Shushan, Hauser, Christophe, Kline, Erik, Galstyan, Aram

arXiv.org Machine Learning

Tackling binary analysis problems has traditionally implied manually defining rules and heuristics. As an alternative, we are suggesting using machine learning models for learning distributed representations of binaries that can be applicable for a number of downstream tasks. We construct a computational graph from the binary executable and use it with a graph convolutional neural network to learn a high dimensional representation of the program. We show the versatility of this approach by using our representations to solve two semantically different binary analysis tasks -- algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results and demonstrate improvement on the state of the art methods for both tasks.